Title: Exploring the World Health Organization (WHO) World Health Statistics from 2024
Author: Alexa Neal
In this project, I wanted to explore different health and health-related indicators. To do this, I utilized the World Health Organization (WHO) World health statistics report of countries from 2024. I combined this data set with another one from the United Nations Development Programme detailing the Human Development Index (HDI) for each country. Countries with an HDI value above 0.700 had high to very high levels of human development, whereas ones with an HDI value below 0.700 had medium to low human development. This information was used to seperate the data set into developed and developing countries for comparison.
Rows: 10,515
Columns: 7
$ Ind.Name <chr> "Adolescent birth rate (per 1000 women)", "Adolescent bi…
$ Country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanista…
$ Dim.Geo.Code <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", …
$ Dim.1.Code <chr> "AGEGROUP_YEARS15-19", "AGEGROUP_YEARS10-14", "SEX_BTSX"…
$ Numeric.Value <dbl> 62.00000, 18.00000, 265.66452, 40.20000, 19.22259, 22.70…
$ HDI.Rank <int> 182, 182, 182, 182, 182, 182, 182, 182, 182, 182, 182, 1…
$ HDI.value <dbl> 0.462, 0.462, 0.462, 0.462, 0.462, 0.462, 0.462, 0.462, …
The developed countries are mostly located in North America, South America, Europe, Asia, and Australia, with a few countries located in North and South Africa. Furthermore, most of these countries have an HDI value in the lower range of values (about 0.70 to 0.80). The developing countries are mostly located in Africa and Asia. The developing countries seem to be more localized to a few areas as compared to the developed countries, which are more spread out across the globe.
All of the correlograms demonstrate a strong positive correlation between the density of medical doctors (MDs) and HDI value, and a strong negative correlation between the density of MDs and HDI ranking.
There is a strong negative correlation between the density of MDs and incidences of malaria. The incidence rate of a disease is defined as the number of newly diagnosed cases. Thus, as the density of MDs in a country increases, the number of new malaria cases decreases. Furthermore, there is a strong negative correlation between incidence of malaria and HDI value. Therefore, as the incidence of malaria in a country increases, its HDI value will decrease, indicating that it is less developed.
Similar results were found across all other diseases explored. As the density of MDs increased, the number of cases of the disease decreased. Additionally, as the incidence of a desease increased, HDI value decreased. However, HIV had a weaker correlation between the variables than malaria or tuberculosis.
Life expectancy is defined as the average length of life in a certain population. For developed countries, the distribution is bimodal, with a peak at 73 years and another one at 82 years. There is an outlier at 61 years, and the maximum value for life expectancy is 84 years. For developing countries, the distribution is more unimodal, although there are two distinct peaks. There is one peak at 62 years and another at 68 years. The minimum value for life expectancy is 51 years and the maximum value is 75 years. Developed countries have higher values for life expectancy and have less of a spread of values as compared to developing countries.
Healthy life expectancy is defined as the average number of years spent in full health. For developed countries, the distribution is unimodal, with a peak at 64 years. There is an outlier at 53 years, and the maximum value for healthy life expectancy is 74 years. For developing countries, the distribution is also unimodal, with two side-by-side peaks at 59 and 60 years. The minimum value for life expectancy is 45 years and the maximum value is 65 years. Similar to life expectancy, developed countries have higher values for healthy life expectancy. However, both developed and developing countries have a similar spread of values, and developing countries actually spend a greater percentage of their lives spent in full health. Developed countries spend about 78% to 88% of their life expectancy in full health where developing countries spend about 88% to 97% of their life expectancy in full health.
The other health indicators chosen are ones that are not necessarily related to one’s physical or mental health but can still predict the health of a population. There is a strong positive correlation between healthy life expectancy and the proportion of people using safe drinking water. This indicates that healthy life expectancy increases as the access to safe drinking water increases. Similarly, there is a strong positive correlation between healthy life expectancy and reliance on clean technologies, and a strong positive correlation between healthy life expectancy and access to hand-washing facilities. Thus, healthy life expectancy increases as access to resources that can provide a cleaner environment increases.
The development of a country can impact its life expectancy and health of the population. A more developed country will have a higher life expectancy and greater health. Additionally, access to resources such as clean water, hand-washing facilities, and clean technology can also impact the overall health of a country. Safe drinking water is a stronger predictor of healthy life expectancy than hand-washing and clean technology. Furthermore, the amount of MDs is strongly correlated with the incidence of malaria, HIV, and tuberculosis. The density of MDs is a better predictor of the incidence of malaria and tuberculosis than HIV.
The biggest limitation was the size of the data set. It had multiple indexes to choose from for each country and because of time constraints, I was not able to explore all of the variables. Therefore, there are likely some insights and relationships that I am missing due to me not exploring certain variables. Furthermore, not all of the countries were included in the combined data set, most likely because the two data sets did not have the same countries or had different names for them. Thus, the analysis of the data set is missing information from countries that could impact the results.
In the future, it would be helpful to explore and consider the other indexes found in the data set. This way, more insights on health and health-related indicators can be discovered for future use. Furthermore, these insights can be helpful when considering ways to increase the development or health of a country. For example, it may be beneficial to increase access to medical doctors or clean water and fuel in order to increase healthy life expectancy.
---
title: "WHO 2024 Health Statistics"
output:
flexdashboard::flex_dashboard:
theme:
version: 4
bootswatch: flatly
orientation: columns
vertical_layout: fill
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
library(DT)
library(tidyverse)
library(pacman)
```
Introduction
===
Column {data-width=550}
---
### Background
**Title:** Exploring the World Health Organization (WHO) World Health Statistics from 2024
**Author:** Alexa Neal
In this project, I wanted to explore different health and health-related indicators. To do this, I utilized the [World Health Organization (WHO) World health statistics report](https://data.who.int/) of countries from 2024. I combined this data set with another one from the [United Nations Development Programme](https://hdr.undp.org/data-center/human-development-index#/indicies/HDI) detailing the Human Development Index (HDI) for each country. Countries with an HDI value above 0.700 had high to very high levels of human development, whereas ones with an HDI value below 0.700 had medium to low human development. This information was used to seperate the data set into developed and developing countries for comparison.
### Research Questions
1. What factors influence life expectancy?
2. How are the density of healthcare providers related to incidences of certain diseases?
3. How does the development of a country impact health-related indicators?
Column {.tabset data-width=450}
---
### Variables
- **Ind.Name** = Index name
- **Country** = Name of country
- **Dim.Geo.Code** = Three letter geographic code for each country
- **Dim.1.Code** = Variable dividing each index into either age group or sex
- **Numeric.Value** = Numeric value corresponding to each index
- **HDI.Rank** = Country ranking based on its HDI value; a smaller ranking means it has a higher HDI
- **HDI.Value** = Specific HDI numerical value for each country
### Glimpse of the Data
```{r}
#reading the datasets
WHO<-read.csv("C:/Users/write/OneDrive/Desktop/school/data m&m/in-class labs/WHO.csv")
HDI<-read.csv("C:/Users/write/OneDrive/Desktop/school/data m&m/in-class labs/HDI.csv")
#combining and cleaning the data
WHO <- rename (WHO,Country=DIM_GEO_NAME)
combine <- full_join(WHO,HDI,by="Country")
clean <- combine %>% subset(select=-c(IND_CODE,DIM_TIME_YEAR,
VALUE_STRING,VALUE_COMMENTS,Life.expectancy.at.birth..years.,
Expected.years.of.schooling..years.,
Mean.years.of.schooling..years.,
Gross.national.income..GNI..per.capita..2017.PPP...,
GNI.per.capita.rank.minus.HDI.rank, HDI.rank.1))
final <-clean %>% rename (Numeric.Value=VALUE_NUMERIC, HDI.Rank=HDI.rank,
HDI.value=Human.Development.Index..HDI.,Ind.Name=IND_NAME,
Dim.Geo.Code=DIM_GEO_CODE,Dim.1.Code=DIM_1_CODE)
glimpse(final)
```
Map of Countries
===
Column {.tabset data-width=700}
---
### Developed Countries
```{r}
#developed vs developing
developed<- final %>% filter(HDI.value>=0.700)
developing <- final %>% filter(HDI.value<0.700)
#developed map
library(pacman)
p_load(tidyverse)
map<-map_data("world")
unique_country_developed<-distinct(developed,Country,.keep_all = TRUE)
developed_map <- unique_country_developed %>%
left_join(map,by=c("Country"="region"))
developed.countries <- ggplot(developed_map,aes(long,lat,group=group))+ geom_polygon(aes(fill=HDI.value), colour="white") +
scale_fill_viridis_c(option="C") + labs(fill= "HDI Value")+
theme_void() + theme(legend.position="bottom")
library(plotly)
ggplotly(developed.countries)
```
### Developing Countries
```{r}
unique_country_developing<-distinct(developing,Country,.keep_all = TRUE)
developing_map <- unique_country_developing %>%
left_join(map,by=c("Country"="region"))
developing.countries<- ggplot(developing_map,aes(long,lat,group=group))+ geom_polygon(aes(fill=HDI.value), colour="white") +
scale_fill_viridis_c(option="C") + labs(fill= "HDI Value")+
theme_void() + theme(legend.position="bottom")
ggplotly(developing.countries)
```
Column {data-width=300}
---
### Analysis
The developed countries are mostly located in North America, South America, Europe, Asia, and Australia, with a few countries located in North and South Africa. Furthermore, most of these countries have an HDI value in the lower range of values (about 0.70 to 0.80). The developing countries are mostly located in Africa and Asia. The developing countries seem to be more localized to a few areas as compared to the developed countries, which are more spread out across the globe.
Disease
===
Column {.tabset data-width=700}
---
### Malaria
```{r}
mds<- filter(final,Ind.Name=="Density of medical doctors (per 10 000 population) ")
healthylife<- filter(final,Ind.Name=="Healthy life expectancy at birth (years)" &
Dim.1.Code=="SEX_BTSX")
lifeexpect<- filter(final,Ind.Name=="Life expectancy at birth (years)" &
Dim.1.Code=="SEX_BTSX")
malaria<- filter(final,Ind.Name=="Malaria incidence (per 1000 population at risk)")
HIV<- filter(final,Ind.Name=="New HIV infections (per 1000 uninfected population)")
tuberculosis<- filter(final,Ind.Name=="Tuberculosis incidence (per 100 000 population)")
handwashing<-filter(final,Ind.Name=="Proportion of population using a hand-washing facility with soap and water (%)")
cleanfuel<- filter(final,Ind.Name=="Proportion of population with primary reliance on clean fuels and technology (%)")
water<-filter(final,Ind.Name=="Proportion of population using safely-managed drinking-water services (%)")
library(corrgram)
mds_common1 <- mds %>% filter(Country %in% malaria$Country) %>% arrange(Country)
malaria_common <- malaria %>% filter(Country %in% mds$Country) %>% arrange(Country)
mds.malaria <- data.frame(MDs = mds_common1,
Malaria = malaria_common)
corrgram(mds.malaria, main = "Correlogram of Density of medical doctors and Malaria incidence", cex.main=1)
```
### HIV
```{r}
mds_common2 <- mds %>% filter(Country %in% HIV$Country) %>% arrange(Country)
HIV_common <- HIV %>% filter(Country %in% mds$Country) %>% arrange(Country)
mds.HIV <- data.frame(MDs = mds_common2,
HIV = HIV_common)
corrgram(mds.HIV, main = "Correlogram of Density of medical doctors and New HIV infections", cex.main=1)
```
### Tuberculosis
```{r}
mds_common3 <- mds %>% filter(Country %in% tuberculosis$Country) %>% arrange(Country)
tuberculosis_common <- tuberculosis %>% filter(Country %in% mds$Country) %>% arrange(Country)
mds.tuberculosis <- data.frame(MDs = mds_common3,
TB = tuberculosis_common)
corrgram(mds.tuberculosis, main = "Correlogram of Density of medical doctors and Tuberculosis incidence", cex.main=1)
```
Column {data-width=300}
---
### Analysis
All of the correlograms demonstrate a strong positive correlation between the density of medical doctors (MDs) and HDI value, and a strong negative correlation between the density of MDs and HDI ranking.
There is a strong negative correlation between the density of MDs and incidences of malaria. The incidence rate of a disease is defined as the number of newly diagnosed cases. Thus, as the density of MDs in a country increases, the number of new malaria cases decreases. Furthermore, there is a strong negative correlation between incidence of malaria and HDI value. Therefore, as the incidence of malaria in a country increases, its HDI value will decrease, indicating that it is less developed.
Similar results were found across all other diseases explored. As the density of MDs increased, the number of cases of the disease decreased. Additionally, as the incidence of a desease increased, HDI value decreased. However, HIV had a weaker correlation between the variables than malaria or tuberculosis.
Life Expectancy
===
Column {.tabset data-width=750}
---
### Developed Countries
```{r}
#filtering (developed)
mds.developed<- filter(developed,Ind.Name=="Density of medical doctors (per 10 000 population) ")
healthylife.developed<- filter(developed,Ind.Name=="Healthy life expectancy at birth (years)" &
Dim.1.Code=="SEX_BTSX")
lifeexpect.developed<- filter(developed,Ind.Name=="Life expectancy at birth (years)" &
Dim.1.Code=="SEX_BTSX")
malaria.developed<- filter(developed,Ind.Name=="Malaria incidence (per 1000 population at risk)")
HIV.developed<- filter(developed,Ind.Name=="New HIV infections (per 1000 uninfected population)")
tuberculosis.developed <- filter(developed,Ind.Name=="Tuberculosis incidence (per 100 000 population)")
handwashing.developed<-filter(developed,Ind.Name=="Proportion of population using a hand-washing facility with soap and water (%)")
cleanfuel.developed<- filter(developed,Ind.Name=="Proportion of population with primary reliance on clean fuels and technology (%)")
water.developed<-filter(developed,Ind.Name=="Proportion of population using safely-managed drinking-water services (%)")
#filtering (developing)
mds.developing<- filter(developing,Ind.Name=="Density of medical doctors (per 10 000 population) ")
healthylife.developing<- filter(developing,Ind.Name=="Healthy life expectancy at birth (years)" &
Dim.1.Code=="SEX_BTSX")
lifeexpect.developing<- filter(developing,Ind.Name=="Life expectancy at birth (years)" &
Dim.1.Code=="SEX_BTSX")
malaria.developing<- filter(developing,Ind.Name=="Malaria incidence (per 1000 population at risk)")
HIV.developing<- filter(developing,Ind.Name=="New HIV infections (per 1000 uninfected population)")
tuberculosis.developing <- filter(developing,Ind.Name=="Tuberculosis incidence (per 100 000 population)")
handwashing.developing<-filter(developing,Ind.Name=="Proportion of population using a hand-washing facility with soap and water (%)")
cleanfuel.developing<- filter(developing,Ind.Name=="Proportion of population with primary reliance on clean fuels and technology (%)")
water.developing<-filter(developing,Ind.Name=="Proportion of population using safely-managed drinking-water services (%)")
#histogram
l1<-ggplot(lifeexpect.developed, aes(x = Numeric.Value)) +
geom_histogram(binwidth = 1, fill = "lightblue", color = "black") +
labs(title = "Histogram of Life Expectancy at Birth of Developed Countries",
x = "Life Expectancy (years)",
y = "Count")
library(plotly)
ggplotly(l1)
```
### Developing Countries
```{r}
l2<-ggplot(lifeexpect.developing, aes(x = Numeric.Value)) +
geom_histogram(binwidth = 1, fill = "lightblue", color = "black") +
labs(title = "Histogram of Life Expectancy at Birth of Developing Countries",
x = "Life Expectancy (years)",
y = "Count")
ggplotly(l2)
```
Column {data-width=350}
---
### Analysis
Life expectancy is defined as the average length of life in a certain population. For developed countries, the distribution is bimodal, with a peak at 73 years and another one at 82 years. There is an outlier at 61 years, and the maximum value for life expectancy is 84 years. For developing countries, the distribution is more unimodal, although there are two distinct peaks. There is one peak at 62 years and another at 68 years. The minimum value for life expectancy is 51 years and the maximum value is 75 years. Developed countries have higher values for life expectancy and have less of a spread of values as compared to developing countries.
Healthy Life Expectancy
===
Column {.tabset data-width=750}
---
### Developed Countries
```{r}
hl1<-ggplot(healthylife.developed, aes(x = Numeric.Value)) +
geom_histogram(binwidth = 1, fill = "lightblue", color = "black") +
labs(title = "Histogram of Healthy Life Expectancy at Birth of Developed Countries",
x = "Healthy Life Expectancy (years)",
y = "Count")
ggplotly(hl1)
```
### Developing Countries
```{r}
hl2<-ggplot(healthylife.developing, aes(x = Numeric.Value)) +
geom_histogram(binwidth = 1, fill = "lightblue", color = "black") +
labs(title = "Histogram of Healthy Life Expectancy at Birth of Developing Countries",
x = "Healthy Life Expectancy (years)",
y = "Count")
ggplotly(hl2)
```
Column {data-width=350}
---
### Analysis
Healthy life expectancy is defined as the average number of years spent in full health. For developed countries, the distribution is unimodal, with a peak at 64 years. There is an outlier at 53 years, and the maximum value for healthy life expectancy is 74 years. For developing countries, the distribution is also unimodal, with two side-by-side peaks at 59 and 60 years. The minimum value for life expectancy is 45 years and the maximum value is 65 years. Similar to life expectancy, developed countries have higher values for healthy life expectancy. However, both developed and developing countries have a similar spread of values, and developing countries actually spend a greater percentage of their lives spent in full health. Developed countries spend about 78% to 88% of their life expectancy in full health where developing countries spend about 88% to 97% of their life expectancy in full health.
Other Health Indicators
===
Column {.tabset data-width=650}
---
### Safe Drinking Water
```{r}
library(dplyr)
water.same <- water %>% filter(Country %in% healthylife$Country) %>% arrange(Country)
healthylife1.same <- healthylife %>% filter(Country %in% water$Country) %>% arrange(Country)
df1 <- data.frame(WaterAccess = water.same$Numeric.Value,
HealthyLife = healthylife1.same$Numeric.Value)
library(ggplot2)
#install.packages("ggpubr")
library(ggpubr)
ggplot(df1, aes(x = WaterAccess, y = HealthyLife)) +
geom_point(color = "black") +
stat_cor(method = "pearson", label.x = 68, label.y = 43) +
labs(
title = "Healthy Life Expectancy vs Safe Drinking Water",
x = "Proportion of population using safely-managed drinking-water services (%)",
y = "Healthy Life Expectancy (years)") +theme_minimal()
```
### Clean Fuel and Tech
```{r}
cleanfuel.same <- cleanfuel %>% filter(Country %in% healthylife$Country) %>% arrange(Country)
healthylife2.same <- healthylife %>% filter(Country %in% cleanfuel$Country) %>% arrange(Country)
df2 <- data.frame(CleanFuel = cleanfuel.same$Numeric.Value,
HealthyLife = healthylife2.same$Numeric.Value)
ggplot(df2, aes(x = CleanFuel, y = HealthyLife)) +
geom_point(color = "black") +
stat_cor(method = "pearson", label.x = 68, label.y = 43) +
labs(
title = "Healthy Life Expectancy vs Reliance on Clean Resources",
x = "Proportion of population with primary reliance on clean fuels and technology (%)",
y = "Healthy Life Expectancy (years)") +theme_minimal()
```
### Hand-Washing
```{r}
handwashing.same <- handwashing %>% filter(Country %in% healthylife$Country) %>% arrange(Country)
healthylife3.same <- healthylife %>% filter(Country %in% handwashing$Country) %>% arrange(Country)
df3 <- data.frame(Handwashing = handwashing.same$Numeric.Value,
HealthyLife = healthylife3.same$Numeric.Value)
ggplot(df3, aes(x = Handwashing, y = HealthyLife)) +
geom_point(color="black") +
stat_cor(method = "pearson", label.x = 68, label.y = 43) +
labs(
title = "Healthy Life Expectancy vs Access to Hand-Washing Facilities",
x = "Proportion of population using a hand-washing facility with soap and water (%)",
y = "Healthy Life Expectancy (years)") + theme_minimal()
```
Column {data-width=350}
---
### Analysis
The other health indicators chosen are ones that are not necessarily related to one's physical or mental health but can still predict the health of a population. There is a strong positive correlation between healthy life expectancy and the proportion of people using safe drinking water. This indicates that healthy life expectancy increases as the access to safe drinking water increases. Similarly, there is a strong positive correlation between healthy life expectancy and reliance on clean technologies, and a strong positive correlation between healthy life expectancy and access to hand-washing facilities. Thus, healthy life expectancy increases as access to resources that can provide a cleaner environment increases.
Conclusion
===
Column {data-width=500}
---
### Conclusion
The development of a country can impact its life expectancy and health of the population. A more developed country will have a higher life expectancy and greater health. Additionally, access to resources such as clean water, hand-washing facilities, and clean technology can also impact the overall health of a country. Safe drinking water is a stronger predictor of healthy life expectancy than hand-washing and clean technology. Furthermore, the amount of MDs is strongly correlated with the incidence of malaria, HIV, and tuberculosis. The density of MDs is a better predictor of the incidence of malaria and tuberculosis than HIV.
### Limitations
The biggest limitation was the size of the data set. It had multiple indexes to choose from for each country and because of time constraints, I was not able to explore all of the variables. Therefore, there are likely some insights and relationships that I am missing due to me not exploring certain variables. Furthermore, not all of the countries were included in the combined data set, most likely because the two data sets did not have the same countries or had different names for them. Thus, the analysis of the data set is missing information from countries that could impact the results.
Column {data-width=500}
---
### Future Directions
In the future, it would be helpful to explore and consider the other indexes found in the data set. This way, more insights on health and health-related indicators can be discovered for future use. Furthermore, these insights can be helpful when considering ways to increase the development or health of a country. For example, it may be beneficial to increase access to medical doctors or clean water and fuel in order to increase healthy life expectancy.
Author
===
Column {data-width=500}
---
### About the Author
My name is Alexa Neal and I am a current junior at the University of Dayton. I am pursuing a Bachelor of Science in Premedicine, and minors in Data Analytics, Medicine and Society, and Neuroscience. My projected graduation is May 2026, and my plans post-graduation are to attend medical school.
Column {data-width=500}
---
###
```{r}
knitr::include_graphics("C:/Users/write/OneDrive/Desktop/photos/headshot.JPG")
```